Skip to content

Conversation

@lifflander
Copy link
Collaborator

Fixes #2435

@lifflander lifflander requested review from cwschilly, cz4rs and nlslatt May 27, 2025 19:39
@lifflander lifflander linked an issue May 27, 2025 that may be closed by this pull request
3 tasks
@github-actions
Copy link

github-actions bot commented May 27, 2025

clang-format output for this changeset:

diff --git a/src/vt/vrt/collection/balance/temperedlb/temperedlb.cc b/src/vt/vrt/collection/balance/temperedlb/temperedlb.cc
index 1c8ba29fb..7e858bef8 100644
--- a/src/vt/vrt/collection/balance/temperedlb/temperedlb.cc
+++ b/src/vt/vrt/collection/balance/temperedlb/temperedlb.cc
@@ -79,9 +79,8 @@ bool TemperedLB::isOverloaded(LoadType load) const {
 /*static*/ std::unordered_map<std::string, std::string>
 TemperedLB::getInputKeysWithHelp() {
   std::unordered_map<std::string, std::string> const keys_help = {
-    {
-      "knowledge",
-      R"(
+    {"knowledge",
+     R"(
 Values: {UserDefined, Complete, Log}
 Default: Log
 Description:
@@ -93,51 +92,41 @@ Description:
       rounds must be explicitly set, but not both.  The relationship between
       rounds and fanout will be approximately
       rounds = log(num_ranks)/log(fanout).
-)"
-    },
-    {
-      "fanout",
-      R"(
+)"},
+    {"fanout",
+     R"(
 Values: <uint16_t>
 Default: N/A
 Description:
   The number of ranks each underloaded rank will communicate with. May be
   determined automatically by an appropriate choice for knowledge.
-)"
-    },
-    {
-      "rounds",
-      R"(
+)"},
+    {"rounds",
+     R"(
 Values: <uint8_t>
 Default: N/A
 Description:
   The number of information propagation rounds. May be determined automatically
   by an appropriate choice for knowledge.
-)"
-    },
-    {
-      "iters",
-      R"(
+)"},
+    {"iters",
+     R"(
 Values: <uint16_t>
 Default: 4
 Description:
   The number of iterations of the information propagation and transfer steps.
-)"
-    },
-    {
-      "trials",
-      R"(
+)"},
+    {"trials",
+     R"(
 Values: <uint16_t>
 Default: 1
 Description:
   How many times to repeat the requested number of iterations, hoping to find
   a better imbalance. Increasing this helps if it's easy to get stuck in a
   local minimum.
-)"
-    },
-    {
-      "criterion",
-      R"(
+)"},
+    {"criterion",
+     R"(
 Values: {Grapevine, ModifiedGrapevine}
 Default: ModifiedGrapevine
 Description:
@@ -147,11 +136,9 @@ Description:
     ModifiedGrapevine: accept if the proposed transfer will not make the load of
       the recipient higher than was the load of the sender immediately before
       proposing the transfer.
-)"
-    },
-    {
-      "inform",
-      R"(
+)"},
+    {"inform",
+     R"(
 Values: {SyncInform, AsyncInform}
 Default: AsyncInform
 Description:
@@ -166,11 +153,9 @@ Description:
       loads when the first message for a round is received, avoiding the
       synchronization cost but delaying the propagation of some information
       until the following round.
-)"
-    },
-    {
-      "transfer",
-      R"(
+)"},
+    {"transfer",
+     R"(
 Values: {Original, Recursive, SwapClusters}
 Default: Original
 Description:
@@ -188,11 +173,9 @@ Description:
       This is especially useful when shared memory constraints are present,
       as breaking shared memory clusters results in higher overall memory
       footprint, in contrast with whole cluster swaps.
-)"
-    },
-    {
-      "ordering",
-      R"(
+)"},
+    {"ordering",
+     R"(
 Values: {Arbitrary, ElmID, FewestMigrations, SmallObjects, LargestObjects}
 Default: FewestMigrations
 Description:
@@ -210,11 +193,9 @@ Description:
       then by descending load for objects with smaller loads, and finally by
       ascending load for objects with larger loads.
     LargestObjects: order by descending load.
-)"
-    },
-    {
-      "cmf",
-      R"(
+)"},
+    {"cmf",
+     R"(
 Values: {Original, NormByMax, NormByMaxExcludeIneligible}
 Default: NormByMax
 Description:
@@ -234,11 +215,9 @@ Description:
       greater of the target load and the load of the most loaded processor in
       the CMF. Only include processors in the CMF that will pass the chosen
       Criterion for the object being considered for transfer.
-)"
-    },
-    {
-      "deterministic",
-      R"(
+)"},
+    {"deterministic",
+     R"(
 Values: {true, false}
 Default: false
 Description:
@@ -247,88 +226,69 @@ Description:
   using a driver that feeds the load balancer object loads read from LB data
   files.  Enabling this requires choosing options for inform and ordering that
   are themselves deterministic.
-)"
-    },
-    {
-      "rollback",
-      R"(
+)"},
+    {"rollback",
+     R"(
 Values: {true, false}
 Default: true
 Description:
   If the final iteration of a trial has a worse imbalance than any earlier
   iteration, it will roll back to the iteration with the best imbalance.
   If transfer_strategy is SwapClusters, rollback is automatically set to false.
-)"
-    },
-    {
-      "targetpole",
-      R"(
+)"},
+    {"targetpole",
+     R"(
 Values: {true, false}
 Default: false
 Description:
   When an object load exceeds the processor-average load (i.e., we have a "long
   pole"), adjust the target load to be the maximum object load ("longest pole")
   instead of the processor-average load.
-)"
-    },
-    {
-      "memory_threshold",
-      R"(
+)"},
+    {"memory_threshold",
+     R"(
 Values: <double>
 Default: 0
 Description: The memory threshold TemperedLB should strictly stay under which is
 respected if memory information is present in the user-defined data.
-)"
-    },
-    {
-      "alpha",
-      R"(
+)"},
+    {"alpha",
+     R"(
 Values: <double>
 Default: 1.0
 Description: α in the work model (load in work model)
-)"
-    },
-    {
-      "beta",
-      R"(
+)"},
+    {"beta",
+     R"(
 Values: <double>
 Default: 0.0
 Description: β in the work model (inter-node communication in work model)
-)"
-    },
-    {
-    "gamma",
-      R"(
+)"},
+    {"gamma",
+     R"(
 Values: <double>
 Default: 0.0
 Description: γ in the work model (intra-node communication in work model)
-)"
-    },
-    {
-    "delta",
-      R"(
+)"},
+    {"delta",
+     R"(
 Values: <double>
 Default: 0.0
 Description: δ in the work model (shared-memory-edges in work model)
-)"
-    },
-    {
-    "epsilon",
-      R"(
+)"},
+    {"epsilon",
+     R"(
 Values: <double>
 Default: infinity
 Description: ε in the work model (memory term in work model)
-)"
-    },
-    {
-    "converge_tolerance",
-    R"(
+)"},
+    {"converge_tolerance",
+     R"(
 Values: <double>
 Default: 0.01
 Description: The relative tolerance across a window of 8 iterations of work
 to define convergence.
-)"
-    }
+)"}
   };
   return keys_help;
 }
@@ -401,7 +361,7 @@ void TemperedLB::inputParams(balance::ConfigEntry* config) {
       k_max_ = std::max(
         static_cast<uint16_t>(1),
         static_cast<uint16_t>(
-          std::ceil(std::sqrt(std::log(num_nodes)/std::log(2.0)))
+          std::ceil(std::sqrt(std::log(num_nodes) / std::log(2.0)))
         )
       );
     }
@@ -444,9 +404,8 @@ void TemperedLB::inputParams(balance::ConfigEntry* config) {
   target_pole_   = config->getOrDefault<bool>("targetpole", target_pole_);
   mem_thresh_    = config->getOrDefault<double>("memory_threshold", mem_thresh_);
 
-  converge_tolerance_ = config->getOrDefault<double>(
-    "converge_tolerance", converge_tolerance_
-  );
+  converge_tolerance_ =
+    config->getOrDefault<double>("converge_tolerance", converge_tolerance_);
 
   balance::LBArgsEnumConverter<CriterionEnum> criterion_converter_(
     "criterion", "CriterionEnum", {
@@ -561,9 +520,10 @@ void TemperedLB::runLB(LoadType total_load) {
   if (theContext()->getNode() == 0) {
     vt_debug_print(
       terse, temperedlb,
-      "TemperedLB::runLB: avg={}, max={}, pole={}, imb={}, load={}, should_lb={}, memory_threshold={}, converge_tolerance={}\n",
-      LoadType(avg), LoadType(max), LoadType(pole), imb,
-      LoadType(load), should_lb, mem_thresh_, converge_tolerance_
+      "TemperedLB::runLB: avg={}, max={}, pole={}, imb={}, load={}, "
+      "should_lb={}, memory_threshold={}, converge_tolerance={}\n",
+      LoadType(avg), LoadType(max), LoadType(pole), imb, LoadType(load),
+      should_lb, mem_thresh_, converge_tolerance_
     );
 
     if (!should_lb) {
@@ -655,16 +615,14 @@ void TemperedLB::readClustersMemoryData() {
 
 void TemperedLB::makeClusterSummaryAddEdges(
   SharedIDType shared_id, ClusterInfo& info,
-  std::set<ObjIDType> const& cluster_objs,
-  ObjIDType obj, bool is_send,
+  std::set<ObjIDType> const& cluster_objs, ObjIDType obj, bool is_send,
   std::vector<std::tuple<elm::ElementIDStruct, double>> const& edges
 ) {
   for (auto const& [send_or_recv_obj, volume] : edges) {
     vt_debug_print(
       verbose, temperedlb,
       "computeClusterSummary: shared_id={} send obj={}, recv_obj={}\n",
-      shared_id,
-      is_send ? obj : send_or_recv_obj,
+      shared_id, is_send ? obj : send_or_recv_obj,
       is_send ? send_or_recv_obj : obj
     );
 
@@ -674,10 +632,8 @@ void TemperedLB::makeClusterSummaryAddEdges(
 
       // inter-cluster edge
       info.addInterClusterEdge(is_send, shared_id, volume);
-    } else if (
-      auto it2 = obj_shared_block_.find(send_or_recv_obj);
-      it2 != obj_shared_block_.end()
-    ) {
+    } else if (auto it2 = obj_shared_block_.find(send_or_recv_obj);
+               it2 != obj_shared_block_.end()) {
       // inter-cluster edge
       info.addInterClusterEdge(is_send, it2->second, volume);
 
@@ -685,8 +641,7 @@ void TemperedLB::makeClusterSummaryAddEdges(
         verbose, temperedlb,
         "computeClusterSummary: ADDING inter shared_id={} send obj={}, "
         "recv_obj={}\n",
-        shared_id,
-        is_send ? obj : send_or_recv_obj,
+        shared_id, is_send ? obj : send_or_recv_obj,
         is_send ? send_or_recv_obj : obj
       );
 
@@ -895,8 +850,7 @@ double TemperedLB::computeWork(
 }
 
 WorkBreakdown TemperedLB::computeWorkBreakdown(
-  NodeType node,
-  std::unordered_map<ObjIDType, LoadType> const& objs,
+  NodeType node, std::unordered_map<ObjIDType, LoadType> const& objs,
   std::set<ObjIDType> const& exclude,
   std::unordered_map<ObjIDType, LoadType> const& include
 ) const {
@@ -911,10 +865,8 @@ WorkBreakdown TemperedLB::computeWorkBreakdown(
     if (exclude.find(obj) == exclude.end()) {
       if (auto it = send_edges_.find(obj); it != send_edges_.end()) {
         for (auto const& [target, volume] : it->second) {
-          if (
-            objs.find(target) != objs.end() or
-            non_cluster_objs_.find(target) != non_cluster_objs_.end()
-          ) {
+          if (objs.find(target) != objs.end() or
+              non_cluster_objs_.find(target) != non_cluster_objs_.end()) {
             intra_rank_bytes_sent += volume;
           } else {
             inter_rank_bytes_sent += volume;
@@ -923,10 +875,8 @@ WorkBreakdown TemperedLB::computeWorkBreakdown(
       }
       if (auto it = recv_edges_.find(obj); it != recv_edges_.end()) {
         for (auto const& [target, volume] : it->second) {
-          if (
-            objs.find(target) != objs.end() or
-            non_cluster_objs_.find(target) != non_cluster_objs_.end()
-          ) {
+          if (objs.find(target) != objs.end() or
+              non_cluster_objs_.find(target) != non_cluster_objs_.end()) {
             intra_rank_bytes_recv += volume;
           } else {
             inter_rank_bytes_recv += volume;
@@ -1001,8 +951,7 @@ double TemperedLB::computeWorkAfterClusterSwap(
     verbose, temperedlb,
     "computeWorkAfterClusterSwap: node_work={}, to_remove.load={}, intra={}, "
     "inter={}\n",
-    node_work,
-    to_remove.load,
+    node_work, to_remove.load,
     gamma * std::max(node_intra_send, node_intra_recv),
     beta * std::max(node_inter_send, node_inter_recv)
   );
@@ -1015,10 +964,9 @@ double TemperedLB::computeWorkAfterClusterSwap(
   node_work -= gamma * std::max(node_intra_send, node_intra_recv);
 
   vt_debug_print(
-    verbose, temperedlb,
-    "node_work (after gamma)={}, sub off={} {} {}\n",
-    node_work, std::max(node_inter_send, node_inter_recv),
-    beta, beta * std::max(node_inter_send, node_inter_recv)
+    verbose, temperedlb, "node_work (after gamma)={}, sub off={} {} {}\n",
+    node_work, std::max(node_inter_send, node_inter_recv), beta,
+    beta * std::max(node_inter_send, node_inter_recv)
   );
 
   node_work -= beta * std::max(node_inter_send, node_inter_recv);
@@ -1140,29 +1088,23 @@ double TemperedLB::computeWorkAfterClusterSwap(
   }
 
   vt_debug_print(
-    verbose, temperedlb,
-    "node_work={}, intra {} {} inter {} {}\n",
-    node_work,
+    verbose, temperedlb, "node_work={}, intra {} {} inter {} {}\n", node_work,
     node_intra_send, node_intra_recv, node_inter_send, node_inter_recv
   );
 
   node_work += gamma * std::max(node_intra_send, node_intra_recv);
-  node_work += beta  * std::max(node_inter_send, node_inter_recv);
+  node_work += beta * std::max(node_inter_send, node_inter_recv);
 
   // Uninitialized destination means that the cluster is empty
   // If to_remove was remote, remove that component from the work
-  if (
-    to_remove.home_node != node and
-    to_remove.home_node != uninitialized_destination
-  ) {
+  if (to_remove.home_node != node and
+      to_remove.home_node != uninitialized_destination) {
     node_work -= delta * to_remove.edge_weight;
   }
 
   // If to_add is now remote, add that component to the work
-  if (
-    to_add.home_node != node and
-    to_add.home_node != uninitialized_destination
-  ) {
+  if (to_add.home_node != node and
+      to_add.home_node != uninitialized_destination) {
     node_work += delta * to_add.edge_weight;
   }
 
@@ -1247,13 +1189,13 @@ void TemperedLB::doLBStages(LoadType start_imb) {
             auto const to_obj = key.toObj();
             auto const bytes = volume.bytes;
 
-	    if (from_obj.isMigratable() or to_obj.isMigratable()) {
-	      //vt_print(temperedlb, "Found comm: to={}, from={} volume={}\n", to_obj, from_obj, volume.bytes);
+            if (from_obj.isMigratable() or to_obj.isMigratable()) {
+              //vt_print(temperedlb, "Found comm: to={}, from={} volume={}\n", to_obj, from_obj, volume.bytes);
 
-	      send_edges_[from_obj].emplace_back(to_obj, bytes);
-	      recv_edges_[to_obj].emplace_back(from_obj, bytes);
-	      has_comm = true;
-	    }
+              send_edges_[from_obj].emplace_back(to_obj, bytes);
+              recv_edges_[to_obj].emplace_back(from_obj, bytes);
+              has_comm = true;
+            }
           } else if (key.commCategory() == elm::CommCategory::WriteShared) {
             auto const to_node = key.toNode();
             auto const shared_id = key.sharedID();
@@ -1276,14 +1218,13 @@ void TemperedLB::doLBStages(LoadType start_imb) {
         if (has_comm_any_) {
           runInEpochCollective("symmEdges", [&]{
             std::unordered_map<NodeType, EdgeMapType> edges;
-            std::unordered_map<elm::ElementIDStruct, SharedIDType> obj_cluster_id;
+            std::unordered_map<elm::ElementIDStruct, SharedIDType>
+              obj_cluster_id;
 
             for (auto const& [from_obj, to_edges] : send_edges_) {
               if (from_obj.getCurrNode() == this_node) {
-                if (
-                  auto it = obj_shared_block_.find(from_obj);
-                  it != obj_shared_block_.end()
-                ) {
+                if (auto it = obj_shared_block_.find(from_obj);
+                    it != obj_shared_block_.end()) {
                   obj_cluster_id[from_obj] = it->second;
                 } else {
                   obj_cluster_id[from_obj] = -1;
@@ -1304,10 +1245,8 @@ void TemperedLB::doLBStages(LoadType start_imb) {
                   edges[curr_to_node][from_obj].emplace_back(to_obj, volume);
                 }
                 if (to_obj.getCurrNode() == this_node) {
-                  if (
-                    auto it = obj_shared_block_.find(to_obj);
-                    it != obj_shared_block_.end()
-                  ) {
+                  if (auto it = obj_shared_block_.find(to_obj);
+                      it != obj_shared_block_.end()) {
                     obj_cluster_id[to_obj] = it->second;
                   } else {
                     obj_cluster_id[to_obj] = -1;
@@ -1328,15 +1267,18 @@ void TemperedLB::doLBStages(LoadType start_imb) {
         this_new_breakdown_ = computeWorkBreakdown(this_node, cur_objs_);
         this_work = this_new_work_ = this_new_breakdown_.work;
 
-	work_stats_handler_ = false;
-	// Perform the reduction for Rank_load_modeled -> processor load only
-	proxy_.allreduce<&TemperedLB::workStatsHandler, collective::PlusOp>(
+        work_stats_handler_ = false;
+        // Perform the reduction for Rank_load_modeled -> processor load only
+        proxy_.allreduce<&TemperedLB::workStatsHandler, collective::PlusOp>(
           std::vector<balance::LoadData>{
             {balance::LoadData{Statistic::Rank_load_modeled, this_new_load_}},
-            {balance::LoadData{Statistic::Rank_strategy_specific_load_modeled, this_new_work_}}
+            {balance::LoadData{
+              Statistic::Rank_strategy_specific_load_modeled, this_new_work_
+            }}
           }
-       );
-       theSched()->runSchedulerWhile([this]{ return not work_stats_handler_; });
+        );
+        theSched()->runSchedulerWhile([this] { return not work_stats_handler_; }
+        );
       } else {
         // Clear out data structures from previous iteration
         selected_.clear();
@@ -1439,15 +1381,18 @@ void TemperedLB::doLBStages(LoadType start_imb) {
       ) {
         this_new_breakdown_ = computeWorkBreakdown(this_node, cur_objs_);
         this_new_work_ = this_new_breakdown_.work;
-	load_stats_handler_ = false;
+        load_stats_handler_ = false;
         // Perform the reduction for Rank_load_modeled -> processor load only
         proxy_.allreduce<&TemperedLB::loadStatsHandler, collective::PlusOp>(
           std::vector<balance::LoadData>{
             {balance::LoadData{Statistic::Rank_load_modeled, this_new_load_}},
-            {balance::LoadData{Statistic::Rank_strategy_specific_load_modeled, this_new_work_}}
+            {balance::LoadData{
+              Statistic::Rank_strategy_specific_load_modeled, this_new_work_
+            }}
           }
         );
-	theSched()->runSchedulerWhile([this]{ return not load_stats_handler_; });
+        theSched()->runSchedulerWhile([this] { return not load_stats_handler_; }
+        );
       }
 
       bool const converged = checkConvergence();
@@ -1466,7 +1411,7 @@ void TemperedLB::doLBStages(LoadType start_imb) {
       }
 
       if (converged) {
-	break;
+        break;
       }
     }
 
@@ -1504,11 +1449,13 @@ void TemperedLB::doLBStages(LoadType start_imb) {
     if (transfer_type_ == TransferTypeEnum::SwapClusters) {
       auto remote_block_count = getRemoteBlockCountHere();
       compute_unhomed_done_ = false;
-      proxy_.reduce<
-        &TemperedLB::remoteBlockCountHandler, collective::PlusOp
-      >(proxy_[0], remote_block_count);
+      proxy_.reduce<&TemperedLB::remoteBlockCountHandler, collective::PlusOp>(
+        proxy_[0], remote_block_count
+      );
       if (this_node == 0) {
-        theSched()->runSchedulerWhile([this]{ return not compute_unhomed_done_; });
+        theSched()->runSchedulerWhile([this] {
+          return not compute_unhomed_done_;
+        });
       }
     }
   } else if (this_node == 0) {
@@ -1519,8 +1466,9 @@ void TemperedLB::doLBStages(LoadType start_imb) {
   }
 
   double end_time = MPI_Wtime();
-  proxy_.allreduce<&TemperedLB::timeLB,
-		   collective::MaxOp>(end_time-start_time);
+  proxy_.allreduce<&TemperedLB::timeLB, collective::MaxOp>(
+    end_time - start_time
+  );
 
   // Concretize lazy migrations by invoking the BaseLB object migration on new
   // object node assignments
@@ -1539,9 +1487,8 @@ bool TemperedLB::checkConvergence(std::size_t last_num_iters) const {
     double w_max_rel_d = (w_max_i - w_max_in) / last_num_iters;
     if (theContext()->getNode() == 0) {
       vt_debug_print(
-        terse, temperedlb,
-        "i={} in={} rel={}, tol={}, rel_tol={}\n",
-        w_max_i, w_max_in, w_max_rel_d, converge_tolerance_,
+        terse, temperedlb, "i={} in={} rel={}, tol={}, rel_tol={}\n", w_max_i,
+        w_max_in, w_max_rel_d, converge_tolerance_,
         converge_tolerance_ * w_max_in
       );
     }
@@ -1555,7 +1502,10 @@ bool TemperedLB::checkConvergence(std::size_t last_num_iters) const {
 void TemperedLB::timeLB(double total_time) {
   auto const this_node = theContext()->getNode();
   if (this_node == 0) {
-    vt_print(temperedlb, "total time={}, per iter={}\n", total_time, total_time/total_num_iters_);
+    vt_print(
+      temperedlb, "total time={}, per iter={}\n", total_time,
+      total_time / total_num_iters_
+    );
   }
 }
 
@@ -1681,7 +1631,7 @@ void TemperedLB::informAsync() {
 
   // Underloaded start the round
   if (is_underloaded_) {
-    runInEpochRooted("informAsync", [&]{
+    runInEpochRooted("informAsync", [&] {
       uint8_t k_cur_async = 0;
       propagateRound(k_cur_async, false, no_epoch);
     });
@@ -1689,7 +1639,7 @@ void TemperedLB::informAsync() {
 
   props_done_ = false;
   proxy_.allreduce<&TemperedLB::propsDone>();
-  theSched()->runSchedulerWhile([this]{ return not props_done_ ; });
+  theSched()->runSchedulerWhile([this] { return not props_done_; });
 
   if (is_overloaded_) {
     vt_debug_print(
@@ -1837,10 +1787,15 @@ void TemperedLB::propagateRound(uint8_t k_cur, bool sync, EpochType epoch) {
         envelopeSetEpoch(msg->env, epoch);
       }
       NodeInfo info{
-        this_new_load_, this_new_work_,
-        this_new_breakdown_.inter_send_vol, this_new_breakdown_.inter_recv_vol,
-        this_new_breakdown_.intra_send_vol, this_new_breakdown_.intra_recv_vol,
-        this_new_breakdown_.shared_vol, getSharedBlocksHere(), non_cluster_objs_
+        this_new_load_,
+        this_new_work_,
+        this_new_breakdown_.inter_send_vol,
+        this_new_breakdown_.inter_recv_vol,
+        this_new_breakdown_.intra_send_vol,
+        this_new_breakdown_.intra_recv_vol,
+        this_new_breakdown_.shared_vol,
+        getSharedBlocksHere(),
+        non_cluster_objs_
       };
       msg->addNodeInfo(this_node, info);
       if (has_memory_data_) {
@@ -1856,10 +1811,15 @@ void TemperedLB::propagateRound(uint8_t k_cur, bool sync, EpochType epoch) {
         envelopeSetEpoch(msg->env, epoch);
       }
       NodeInfo info{
-        this_new_load_, this_new_work_,
-        this_new_breakdown_.inter_send_vol, this_new_breakdown_.inter_recv_vol,
-        this_new_breakdown_.intra_send_vol, this_new_breakdown_.intra_recv_vol,
-        this_new_breakdown_.shared_vol, getSharedBlocksHere(), non_cluster_objs_
+        this_new_load_,
+        this_new_work_,
+        this_new_breakdown_.inter_send_vol,
+        this_new_breakdown_.inter_recv_vol,
+        this_new_breakdown_.intra_send_vol,
+        this_new_breakdown_.intra_recv_vol,
+        this_new_breakdown_.shared_vol,
+        getSharedBlocksHere(),
+        non_cluster_objs_
       };
       msg->addNodeInfo(this_node, info);
       if (has_memory_data_) {
@@ -2448,12 +2408,8 @@ auto TemperedLB::removeClusterToSend(
   }
 
   return std::make_tuple(
-    give_objs,
-    give_obj_shared_block,
-    give_shared_blocks_size,
-    give_obj_working_bytes,
-    give_send,
-    give_recv
+    give_objs, give_obj_shared_block, give_shared_blocks_size,
+    give_obj_working_bytes, give_send, give_recv
   );
 }
 
@@ -2485,10 +2441,15 @@ void TemperedLB::considerSwapsAfterLock(MsgSharedPtr<LockedInfoMsg> msg) {
   auto const this_node = theContext()->getNode();
 
   NodeInfo this_info{
-    this_new_load_, this_new_work_,
-    this_new_breakdown_.inter_send_vol, this_new_breakdown_.inter_recv_vol,
-    this_new_breakdown_.intra_send_vol, this_new_breakdown_.intra_recv_vol,
-    this_new_breakdown_.shared_vol, getSharedBlocksHere(), non_cluster_objs_
+    this_new_load_,
+    this_new_work_,
+    this_new_breakdown_.inter_send_vol,
+    this_new_breakdown_.inter_recv_vol,
+    this_new_breakdown_.intra_send_vol,
+    this_new_breakdown_.intra_recv_vol,
+    this_new_breakdown_.shared_vol,
+    getSharedBlocksHere(),
+    non_cluster_objs_
   };
 
   auto criterion = [&,this](
@@ -2614,14 +2575,8 @@ void TemperedLB::considerSwapsAfterLock(MsgSharedPtr<LockedInfoMsg> msg) {
         src_shared_id, try_shared_id, theMsg()->getEpoch()
       );
       proxy_[try_rank].template send<&TemperedLB::giveCluster>(
-        this_node,
-        give_shared_blocks_size,
-        give_objs,
-        give_obj_shared_block,
-        give_obj_working_bytes,
-        give_send,
-        give_recv,
-        try_shared_id
+        this_node, give_shared_blocks_size, give_objs, give_obj_shared_block,
+        give_obj_working_bytes, give_send, give_recv, try_shared_id
       );
     });
 
@@ -2636,7 +2591,9 @@ void TemperedLB::considerSwapsAfterLock(MsgSharedPtr<LockedInfoMsg> msg) {
     );
   }
 
-  proxy_[try_rank].template send<&TemperedLB::releaseLock>(false, this_node, 0.0);
+  proxy_[try_rank].template send<&TemperedLB::releaseLock>(
+    false, this_node, 0.0
+  );
 
   vt_debug_print(
     verbose, temperedlb,
@@ -2664,8 +2621,7 @@ void TemperedLB::giveCluster(
   std::unordered_map<ObjIDType, LoadType> const& give_objs,
   std::unordered_map<ObjIDType, SharedIDType> const& give_obj_shared_block,
   std::unordered_map<ObjIDType, BytesType> const& give_obj_working_bytes,
-  EdgeMapType const& give_send,
-  EdgeMapType const& give_recv,
+  EdgeMapType const& give_send, EdgeMapType const& give_recv,
   SharedIDType take_cluster
 ) {
   auto const this_node = theContext()->getNode();
@@ -2696,24 +2652,12 @@ void TemperedLB::giveCluster(
   }
 
   if (take_cluster != no_shared_id) {
-    auto const& [
-      take_objs,
-      take_obj_shared_block,
-      take_shared_blocks_size,
-      take_obj_working_bytes,
-      take_send,
-      take_recv
-    ] = removeClusterToSend(take_cluster);
+    auto const& [take_objs, take_obj_shared_block, take_shared_blocks_size, take_obj_working_bytes, take_send, take_recv] =
+      removeClusterToSend(take_cluster);
 
     proxy_[from_rank].template send<&TemperedLB::giveCluster>(
-      this_node,
-      take_shared_blocks_size,
-      take_objs,
-      take_obj_shared_block,
-      take_obj_working_bytes,
-      take_send,
-      take_recv,
-      no_shared_id
+      this_node, take_shared_blocks_size, take_objs, take_obj_shared_block,
+      take_obj_working_bytes, take_send, take_recv, no_shared_id
     );
   }
 
@@ -2738,8 +2682,7 @@ void TemperedLB::giveCluster(
 
 void TemperedLB::tryLock(NodeType requesting_node, double criterion_value) {
   vt_debug_print(
-    normal, temperedlb,
-    "tryLock: requesting_node={}, c_try={}\n",
+    normal, temperedlb, "tryLock: requesting_node={}, c_try={}\n",
     requesting_node, criterion_value
   );
 
@@ -2750,7 +2693,9 @@ void TemperedLB::tryLock(NodeType requesting_node, double criterion_value) {
   }
 }
 
-void TemperedLB::releaseLock(bool try_again, NodeType try_lock_node, double c_try) {
+void TemperedLB::releaseLock(
+  bool try_again, NodeType try_lock_node, double c_try
+) {
   vt_debug_print(
     normal, temperedlb,
     "releaseLock: pending size={}, try_again={}, try_lock_node={}, c_try={}\n",
@@ -2779,12 +2724,12 @@ void TemperedLB::lockObtained(LockedInfoMsg* in_msg) {
     is_locked_, is_swapping_, locking_rank_, msg->locked_node, is_swapping_
   );
 
-  auto action = [this, msg]{
+  auto action = [this, msg] {
     try_locks_pending_--;
 
     vt_debug_print(
-      normal, temperedlb,
-      "try locks pending={} run action\n", try_locks_pending_
+      normal, temperedlb, "try locks pending={} run action\n",
+      try_locks_pending_
     );
 
     considerSwapsAfterLock(msg);
@@ -2793,7 +2738,9 @@ void TemperedLB::lockObtained(LockedInfoMsg* in_msg) {
   if (is_locked_ && locking_rank_ <= msg->locked_node) {
     cycle_locks_++;
     auto const this_node = theContext()->getNode();
-    proxy_[msg->locked_node].template send<&TemperedLB::releaseLock>(true, this_node, in_msg->locked_c_try);
+    proxy_[msg->locked_node].template send<&TemperedLB::releaseLock>(
+      true, this_node, in_msg->locked_c_try
+    );
     try_locks_pending_--;
   } else if (is_locked_) {
     pending_actions_.push_back(action);
@@ -2817,7 +2764,7 @@ void TemperedLB::satisfyLockRequest() {
       vt_debug_print(
         normal, temperedlb,
         "satisfyLockRequest (iterate): node={}, c_try={}, forced_release={}\n",
-	tl.requesting_node, tl.c_try, tl.forced_release
+        tl.requesting_node, tl.c_try, tl.forced_release
       );
     }
 
@@ -2838,16 +2785,20 @@ void TemperedLB::satisfyLockRequest() {
     auto const this_node = theContext()->getNode();
 
     vt_debug_print(
-      normal, temperedlb,
-      "satisfyLockRequest: lock obtained for node={}\n",
+      normal, temperedlb, "satisfyLockRequest: lock obtained for node={}\n",
       lock.requesting_node
     );
 
     NodeInfo this_info{
-      this_new_load_, this_new_work_,
-      this_new_breakdown_.inter_send_vol, this_new_breakdown_.inter_recv_vol,
-      this_new_breakdown_.intra_send_vol, this_new_breakdown_.intra_recv_vol,
-      this_new_breakdown_.shared_vol, getSharedBlocksHere(), non_cluster_objs_
+      this_new_load_,
+      this_new_work_,
+      this_new_breakdown_.inter_send_vol,
+      this_new_breakdown_.inter_recv_vol,
+      this_new_breakdown_.intra_send_vol,
+      this_new_breakdown_.intra_recv_vol,
+      this_new_breakdown_.shared_vol,
+      getSharedBlocksHere(),
+      non_cluster_objs_
     };
 
     proxy_[lock.requesting_node].template send<&TemperedLB::lockObtained>(
@@ -2867,10 +2818,15 @@ void TemperedLB::swapClusters() {
   auto const this_node = theContext()->getNode();
 
   NodeInfo this_info{
-    this_new_load_, this_new_work_,
-    this_new_breakdown_.inter_send_vol, this_new_breakdown_.inter_recv_vol,
-    this_new_breakdown_.intra_send_vol, this_new_breakdown_.intra_recv_vol,
-    this_new_breakdown_.shared_vol, getSharedBlocksHere(), non_cluster_objs_
+    this_new_load_,
+    this_new_work_,
+    this_new_breakdown_.inter_send_vol,
+    this_new_breakdown_.inter_recv_vol,
+    this_new_breakdown_.intra_send_vol,
+    this_new_breakdown_.intra_recv_vol,
+    this_new_breakdown_.shared_vol,
+    getSharedBlocksHere(),
+    non_cluster_objs_
   };
 
   auto criterion = [&,this](
@@ -2922,12 +2878,11 @@ void TemperedLB::swapClusters() {
           found_potential_good_swap = true;
 
           vt_debug_print(
-            normal, temperedlb,
-	    "try lock rank={}, c_try={}\n", try_rank, c_try
+            normal, temperedlb, "try lock rank={}, c_try={}\n", try_rank, c_try
           );
 
           proxy_[try_rank].template send<&TemperedLB::tryLock>(this_node, c_try);
-	  try_locks_pending_++;
+          try_locks_pending_++;
           break;
         }
       }
@@ -2941,12 +2896,11 @@ void TemperedLB::swapClusters() {
           found_potential_good_swap = true;
 
           vt_debug_print(
-            normal, temperedlb,
-	    "try lock rank={}, c_try={}\n", try_rank, c_try
+            normal, temperedlb, "try lock rank={}, c_try={}\n", try_rank, c_try
           );
 
           proxy_[try_rank].template send<&TemperedLB::tryLock>(this_node, c_try);
-	  try_locks_pending_++;
+          try_locks_pending_++;
           break;
         }
       } // try_clusters
@@ -2960,11 +2914,10 @@ void TemperedLB::swapClusters() {
   satisfyLockRequest();
 
   vt_debug_print(
-    normal, temperedlb,
-    "try locks pending={}\n", try_locks_pending_
+    normal, temperedlb, "try locks pending={}\n", try_locks_pending_
   );
 
-  theSched()->runSchedulerWhile([this]{
+  theSched()->runSchedulerWhile([this] {
     if (not is_locked_) {
       satisfyLockRequest();
     }
@@ -2973,7 +2926,7 @@ void TemperedLB::swapClusters() {
 
   done_with_swaps_ = false;
   proxy_.allreduce<&TemperedLB::finishedSwaps>();
-  theSched()->runSchedulerWhile([this]{ return not done_with_swaps_; });
+  theSched()->runSchedulerWhile([this] { return not done_with_swaps_; });
 
   vt_debug_print(
     normal, temperedlb,
diff --git a/src/vt/vrt/collection/balance/temperedlb/temperedlb.h b/src/vt/vrt/collection/balance/temperedlb/temperedlb.h
index f6f73517b..f4e017595 100644
--- a/src/vt/vrt/collection/balance/temperedlb/temperedlb.h
+++ b/src/vt/vrt/collection/balance/temperedlb/temperedlb.h
@@ -153,19 +153,18 @@ public:
     send_edges_ = in_send_edges;
   }
 
-  void setObjSharedBlock(
-    std::unordered_map<ObjIDType, SharedIDType> const& in_shared
+  void
+  setObjSharedBlock(std::unordered_map<ObjIDType, SharedIDType> const& in_shared
   ) {
     obj_shared_block_ = in_shared;
   }
-  void setSharedSize(
-    std::unordered_map<SharedIDType, BytesType> const& in_size
+  void setSharedSize(std::unordered_map<SharedIDType, BytesType> const& in_size
   ) {
     shared_block_size_ = in_size;
   }
   void setSharedEdge(
     std::unordered_map<SharedIDType, std::tuple<NodeType, BytesType>> const&
-    in_shared_edge
+      in_shared_edge
   ) {
     shared_block_edge_ = in_shared_edge;
   }
@@ -226,8 +225,7 @@ public:
    */
   void makeClusterSummaryAddEdges(
     SharedIDType shared_id, ClusterInfo& info,
-    std::set<ObjIDType> const& cluster_objs,
-    ObjIDType obj, bool is_send,
+    std::set<ObjIDType> const& cluster_objs, ObjIDType obj, bool is_send,
     std::vector<std::tuple<elm::ElementIDStruct, double>> const& edges
   );
 
@@ -356,8 +354,7 @@ public:
    * \return the amount of work currently for the set of objects
    */
   WorkBreakdown computeWorkBreakdown(
-    NodeType node,
-    std::unordered_map<ObjIDType, LoadType> const& objs,
+    NodeType node, std::unordered_map<ObjIDType, LoadType> const& objs,
     std::set<ObjIDType> const& exclude = {},
     std::unordered_map<ObjIDType, LoadType> const& include = {}
   ) const;
@@ -397,8 +394,7 @@ public:
     std::unordered_map<ObjIDType, LoadType> const& give_objs,
     std::unordered_map<ObjIDType, SharedIDType> const& give_obj_shared_block,
     std::unordered_map<ObjIDType, BytesType> const& give_obj_working_bytes,
-    EdgeMapType const& give_send,
-    EdgeMapType const& give_recv,
+    EdgeMapType const& give_send, EdgeMapType const& give_recv,
     SharedIDType take_cluster
   );
 
@@ -470,7 +466,7 @@ private:
   std::unordered_set<NodeType> underloaded_         = {};
   std::unordered_set<NodeType> new_underloaded_     = {};
   std::unordered_map<ObjIDType, LoadType> cur_objs_ = {};
-  std::set<ObjIDType> non_cluster_objs_             = {};
+  std::set<ObjIDType> non_cluster_objs_ = {};
   EdgeMapType send_edges_;
   EdgeMapType recv_edges_;
   LoadType this_new_load_                           = 0.0;
@@ -537,7 +533,8 @@ private:
 
     double operator<(TryLock const& other) const {
       // sort in reverse order so the best is first!
-      return c_try == other.c_try ? requesting_node < other.requesting_node : c_try > other.c_try;
+      return c_try == other.c_try ? requesting_node < other.requesting_node :
+                                    c_try > other.c_try;
     }
   };
 
diff --git a/tests/unit/lb/test_temperedlb.cc b/tests/unit/lb/test_temperedlb.cc
index 5f32e428e..0febab30d 100644
--- a/tests/unit/lb/test_temperedlb.cc
+++ b/tests/unit/lb/test_temperedlb.cc
@@ -61,11 +61,10 @@ std::string writeTemperedLBConfig(
   auto config_file = getUniqueFilename();
   if (this_node == 0) {
     std::ofstream cfg_file_{config_file.c_str(), std::ofstream::out | std::ofstream::trunc};
-    cfg_file_ << "0 TemperedLB converge_tolerance=0.001 iters=10 trials=3 transfer=" << transfer_strategy <<
-      " alpha=" << alpha <<
-      " beta=" << beta <<
-      " gamma=" << gamma <<
-      " delta=" << delta;
+    cfg_file_
+      << "0 TemperedLB converge_tolerance=0.001 iters=10 trials=3 transfer="
+      << transfer_strategy << " alpha=" << alpha << " beta=" << beta
+      << " gamma=" << gamma << " delta=" << delta;
     if (transfer_strategy == "SwapClusters") {
         cfg_file_ << " memory_threshold=" << memory_threshold;
     }
diff --git a/tests/unit/lb/test_temperedlb_work_calc.cc b/tests/unit/lb/test_temperedlb_work_calc.cc
index 7fe3dd944..5ea0f8fa0 100644
--- a/tests/unit/lb/test_temperedlb_work_calc.cc
+++ b/tests/unit/lb/test_temperedlb_work_calc.cc
@@ -95,8 +95,7 @@ using TestTemperedLBWorkCalc = TestParallelHarness;
 
 **/
 
-auto getObjLoads(
-  std::unordered_map<elm::ElementIDStruct, LoadType> const& map
+auto getObjLoads(std::unordered_map<elm::ElementIDStruct, LoadType> const& map
 ) -> double {
   double load = 0;
   for (auto const& [_, value] : map) {
@@ -108,8 +107,8 @@ auto getObjLoads(
 auto getNodeEdges(
   std::unordered_map<elm::ElementIDStruct, LoadType> const& objs,
   std::unordered_map<
-    elm::ElementIDStruct, std::vector<std::tuple<elm::ElementIDStruct, double>>
-  > const& edge_map,
+    elm::ElementIDStruct,
+    std::vector<std::tuple<elm::ElementIDStruct, double>>> const& edge_map,
   bool off
 ) -> double {
   double total_bytes = 0;
@@ -133,10 +132,11 @@ auto getNodeEdges(
 auto computeOffHomeVolume(
   vt::NodeType node,
   std::unordered_map<elm::ElementIDStruct, LoadType> const& objs,
-  std::unordered_map<elm::ElementIDStruct, SharedIDType> const& obj_shared_block,
+  std::unordered_map<elm::ElementIDStruct, SharedIDType> const&
+    obj_shared_block,
   std::unordered_map<
-    SharedIDType, std::tuple<NodeType, vt::vrt::collection::lb::BytesType>
-  > const& shared_edge
+    SharedIDType,
+    std::tuple<NodeType, vt::vrt::collection::lb::BytesType>> const& shared_edge
 ) -> double {
   double total_bytes = 0;
   std::set<SharedIDType> shared_here;
@@ -173,8 +173,7 @@ auto getObjLoad(
 }
 
 auto testClusterSwap(
-  TemperedLB* tlb,
-  NodeType rank,
+  TemperedLB* tlb, NodeType rank,
   std::unordered_map<NodeType, WorkBreakdown> work_init,
   std::unordered_map<SharedIDType, ClusterInfo> cluster_info,
   std::map<NodeType, std::unordered_map<ObjIDType, LoadType>> cur_objs,
@@ -209,8 +208,10 @@ auto testClusterSwap(
   NodeInfo ni{
     getObjLoads(cur_objs_add_remove),
     work_init[rank].work,
-    work_init[rank].inter_send_vol, work_init[rank].inter_recv_vol,
-    work_init[rank].intra_send_vol, work_init[rank].intra_recv_vol,
+    work_init[rank].inter_send_vol,
+    work_init[rank].inter_recv_vol,
+    work_init[rank].intra_send_vol,
+    work_init[rank].intra_recv_vol,
     work_init[rank].shared_vol,
     blocks_here_initial,
     non_cluster_objs
@@ -230,7 +231,6 @@ auto testClusterSwap(
   vt_print(gen, "new_work={}, wb2={}\n", new_work, wb2.work);
 }
 
-
 TEST_F(TestTemperedLBWorkCalc, test_work_calc_1) {
   using BytesType = vt::vrt::collection::lb::BytesType;
 
@@ -249,13 +249,7 @@ TEST_F(TestTemperedLBWorkCalc, test_work_calc_1) {
   // Test some arbitrary values of each term to make sure they are computed
   // correctly
   double work = tlb->computeWork(1000, 100, 50, 20);
-  EXPECT_EQ(
-    work,
-    alpha * 1000 +
-    beta  * 100 +
-    gamma * 50 +
-    delta * 20
-  );
+  EXPECT_EQ(work, alpha * 1000 + beta * 100 + gamma * 50 + delta * 20);
 
   // rank 1 objects
   auto o1 = elm::ElmIDBits::createCollectionImpl(true, 1, 1, 1);
@@ -277,56 +271,28 @@ TEST_F(TestTemperedLBWorkCalc, test_work_calc_1) {
 
   // clusters
   std::unordered_map<ObjIDType, SharedIDType> obj_shared_block = {
-    {o1, 1}, {o2, 1},
-    {o3, 2},
-    {o4, 3},
-    {o5, 4}, {o6, 4},
-    {o7, 5}, {o8, 5}, {o9, 5}
+    {o1, 1}, {o2, 1}, {o3, 2}, {o4, 3}, {o5, 4},
+    {o6, 4}, {o7, 5}, {o8, 5}, {o9, 5}
   };
 
   std::unordered_map<SharedIDType, BytesType> shared_block_size = {
-    {1, 100},
-    {2, 200},
-    {3, 150},
-    {4, 300},
-    {5, 100}
+    {1, 100}, {2, 200}, {3, 150}, {4, 300}, {5, 100}
   };
 
   std::map<NodeType, std::unordered_map<ObjIDType, LoadType>> cur_objs = {
-    {1,
-      {{o1, 20},
-       {o2, 5},
-       {o3, 10},
-       {o10, 3}
-      }
-    },
-    {2,
-     {{o4, 30},
-      {o12, 9}
-     }
-    },
-    {3,
-      {{o5, 10},
-       {o6, 15},
-       {o7, 3},
-       {o8, 2},
-       {o9, 8},
-       {o11, 1}
-      }
-    }
+    {1, {{o1, 20}, {o2, 5}, {o3, 10}, {o10, 3}}},
+    {2, {{o4, 30}, {o12, 9}}},
+    {3, {{o5, 10}, {o6, 15}, {o7, 3}, {o8, 2}, {o9, 8}, {o11, 1}}}
   };
 
-  std::unordered_map<SharedIDType, std::tuple<NodeType, BytesType>> shared_edge = {
-    {1, {1, 100}},
-    {2, {1, 200}},
-    {3, {2, 150}},
-    {4, {3, 300}},
-    {5, {3, 100}}
-  };
+  std::unordered_map<SharedIDType, std::tuple<NodeType, BytesType>>
+    shared_edge = {
+      {1, {1, 100}}, {2, {1, 200}}, {3, {2, 150}}, {4, {3, 300}}, {5, {3, 100}}
+    };
 
   std::unordered_map<
-    elm::ElementIDStruct, std::vector<std::tuple<elm::ElementIDStruct, double>>
-  > send_edges, recv_edges;
+    elm::ElementIDStruct, std::vector<std::tuple<elm::ElementIDStruct, double>>>
+    send_edges, recv_edges;
 
   send_edges = {
     {o1, {{o2, 10}, {o5, 10}}},
@@ -374,26 +340,33 @@ TEST_F(TestTemperedLBWorkCalc, test_work_calc_1) {
     EXPECT_NEAR(
       wb.work,
       alpha * getObjLoads(cur_objs[rank]) +
-      beta * std::max(
-        getNodeEdges(cur_objs[rank], send_edges, true),
-        getNodeEdges(cur_objs[rank], recv_edges, true)
-      ) +
-      gamma * std::max(
-        getNodeEdges(cur_objs[rank], send_edges, false),
-        getNodeEdges(cur_objs[rank], recv_edges, false)
-      ) +
-      delta * computeOffHomeVolume(
-        rank, cur_objs[rank], obj_shared_block, shared_edge
-      ),
+        beta *
+          std::max(
+            getNodeEdges(cur_objs[rank], send_edges, true),
+            getNodeEdges(cur_objs[rank], recv_edges, true)
+          ) +
+        gamma *
+          std::max(
+            getNodeEdges(cur_objs[rank], send_edges, false),
+            getNodeEdges(cur_objs[rank], recv_edges, false)
+          ) +
+        delta *
+          computeOffHomeVolume(
+            rank, cur_objs[rank], obj_shared_block, shared_edge
+          ),
       FLT_EPSILON
     );
 
     node_info[rank] = NodeInfo{
       getObjLoads(cur_objs[rank]),
       wb.work,
-      wb.inter_send_vol, wb.inter_recv_vol,
-      wb.intra_send_vol, wb.intra_recv_vol,
-      wb.shared_vol, tlb->getSharedBlocksHere(), {}
+      wb.inter_send_vol,
+      wb.inter_recv_vol,
+      wb.intra_send_vol,
+      wb.intra_recv_vol,
+      wb.shared_vol,
+      tlb->getSharedBlocksHere(),
+      {}
     };
 
     for (auto const& shared_id : tlb->getSharedBlocksHere()) {
@@ -458,18 +431,17 @@ TEST_F(TestTemperedLBWorkCalc, test_work_calc_1) {
         for (auto const& shared_id_add : to_try_add) {
           vt_print(
             temperedlb,
-            "try swap: rank={}, shared_id_remove={}, shared_id_add={}\n",
-            rank, shared_id_remove, shared_id_add
+            "try swap: rank={}, shared_id_remove={}, shared_id_add={}\n", rank,
+            shared_id_remove, shared_id_add
           );
           testClusterSwap(
-            tlb.get(), rank, work_init, cluster_info, cur_objs, obj_shared_block,
-            shared_id_remove, shared_id_add
+            tlb.get(), rank, work_init, cluster_info, cur_objs,
+            obj_shared_block, shared_id_remove, shared_id_add
           );
         }
       }
     }
   }
-
 }
 
 #endif /* vt_check_enabled(lblite) */

@lifflander
Copy link
Collaborator Author

@ppebay and I are discussing some changes to detecting convergence, so we are not going to merge this until we test those changes and integrate them here.

@lifflander lifflander marked this pull request as draft June 9, 2025 21:44
@lifflander lifflander force-pushed the 2435-temperedlb-improvementsoptimizations branch from 16df4e9 to ec3f27f Compare June 26, 2025 17:22
@lifflander lifflander marked this pull request as ready for review June 26, 2025 21:58
@lifflander lifflander force-pushed the 2435-temperedlb-improvementsoptimizations branch 2 times, most recently from 41baf01 to 693d3d8 Compare July 3, 2025 15:40
@ppebay ppebay self-requested a review July 7, 2025 17:21
@lifflander lifflander force-pushed the 2435-temperedlb-improvementsoptimizations branch 2 times, most recently from e3f4bbf to 53a4d6e Compare July 28, 2025 20:38
cz4rs
cz4rs previously requested changes Jul 29, 2025
Copy link
Contributor

@cz4rs cz4rs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can understand the changes, this looks correct. I will try to give test_temperedlb_work_calc.cc another look tomorrow to improve my understanding a bit more 😉

The general impression is that the already complex TemperedLB code is getting even more complex. Any effort to extract helper methods, improve abstraction level consistency would be appreciated. Some parts of the code could use lower level unit testing, although this might require extracting whole helper classes / components.

Comment on lines 400 to 406
k_max_ = static_cast<uint16_t>(2);
f_ = std::max(
static_cast<uint16_t>(1),
static_cast<uint16_t>(
std::ceil(std::sqrt(std::log(num_nodes)/std::log(2.0)))
)
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ppebay Can you have a look at these formulas.

Comment on lines 1100 to 1118
for (auto const& [shared_id, volume] : to_add.inter_cluster_send_vol) {
// Local cluster edge if it's in our shared IDs, otherwise it's remote
if (shared_id == to_add.shared_id) {
node_intra_send += volume;
} else if (cur_shared_ids.find(shared_id) != cur_shared_ids.end()) {
node_intra_send += volume;
node_intra_recv += volume;

node_inter_recv -= volume;
} else {
node_inter_send += volume;
}
}

for (auto const& [shared_id, volume] : to_add.inter_cluster_recv_vol) {
// Local cluster edge if it's in our shared IDs, otherwise it's remote
if (shared_id == to_add.shared_id) {
node_intra_recv += volume;
} else if (cur_shared_ids.find(shared_id) != cur_shared_ids.end()) {
node_intra_recv += volume;
node_intra_send += volume;

node_inter_send -= volume;
} else {
node_inter_recv += volume;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Close to code duplication.

Comment on lines +1119 to +1114
node_intra_recv += volume;
node_intra_send += volume;

node_inter_send -= volume;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extracting this (and other variants of +/- volume) into named helpers could help readability.

Comment on lines +1061 to +1056
node_intra_recv -= volume;
node_intra_send -= volume;

node_inter_send += volume;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extract into a helper instead of adding a comment.

@cz4rs
Copy link
Contributor

cz4rs commented Jul 29, 2025

Codacy complains about duplication too:
https://app.codacy.com/gh/DARMA-tasking/vt/pull-requests/2436/duplication

@lifflander lifflander force-pushed the 2435-temperedlb-improvementsoptimizations branch from 30d8d08 to 23865f3 Compare August 29, 2025 00:31
@github-actions
Copy link

github-actions bot commented Aug 29, 2025

Pipelines results

vt-build-amd64-ubuntu-20-04-clang-9-cpp

Build for 65ca237 (2025-10-01 19:16:25 UTC)

Compilation – successful

Testing – passed

View job log


vt-build-amd64-alpine-3-16-clang-cpp

Build for 65ca237 (2025-10-01 19:16:25 UTC)

Compilation – successful

Testing – passed

View job log


vt-build-amd64-ubuntu-24-04-gcc-14-cpp

Build for 65ca237 (2025-10-01 19:16:25 UTC)

Compilation – successful

Testing – passed

View job log


vt-build-amd64-ubuntu-20-04-gcc-10-cpp

Build for 65ca237 (2025-10-01 19:16:25 UTC)

Compilation – successful

Testing – passed

View job log


vt-build-amd64-ubuntu-20-04-gcc-10-openmpi-cpp

Build for 65ca237 (2025-10-01 19:16:25 UTC)

Compilation – successful

Testing – passed

View job log


vt-build-amd64-ubuntu-20-04-gcc-9-ldms-cpp

Build for 65ca237 (2025-10-01 19:16:25 UTC)

Compilation – successful

Testing – passed

View job log


vt-build-amd64-ubuntu-22-04-clang-13-cpp

Build for 65ca237 (2025-10-01 19:16:25 UTC)

Compilation – successful

Testing – passed

View job log


vt-build-amd64-ubuntu-22-04-clang-11-cpp

Build for 65ca237 (2025-10-01 19:16:25 UTC)

Compilation – successful

Testing – passed

View job log


vt-build-amd64-ubuntu-22-04-clang-12-cpp

Build for 65ca237 (2025-10-01 19:16:25 UTC)

Compilation – successful

Testing – passed

View job log


vt-build-amd64-ubuntu-24-04-clang-16-zoltan-cpp

Build for 65ca237 (2025-10-01 19:16:25 UTC)

Compilation – successful

Testing – passed

View job log


vt-build-amd64-ubuntu-22-04-gcc-12-vtk-cpp

Build for 65ca237 (2025-10-01 19:16:25 UTC)

Compilation – successful

Testing – passed

View job log


vt-build-amd64-ubuntu-20-04-gcc-10-openmpi-cpp-spack

Build for 65ca237 (2025-10-01 19:16:25 UTC)

Compilation – successful

Testing – passed

View job log


vt-build-amd64-ubuntu-22-04-gcc-12-cpp

Build for 65ca237 (2025-10-01 19:16:25 UTC)

Compilation – successful

Testing – passed

View job log


vt-build-amd64-ubuntu-20-04-icpx-cpp

Build for 65ca237 (2025-10-01 19:16:25 UTC)

Compilation – successful

Testing – passed

View job log


vt-build-amd64-ubuntu-22-04-gcc-11-cpp

Build for 65ca237 (2025-10-01 19:16:25 UTC)

Compilation – successful

Testing – passed

View job log


vt-build-amd64-ubuntu-24-04-gcc-13-cpp

Build for 65ca237 (2025-10-01 19:16:25 UTC)

Compilation – successful

Testing – passed

View job log


vt-build-amd64-ubuntu-20-04-gcc-9-cpp

Build for 65ca237 (2025-10-01 19:16:25 UTC)

Compilation – successful

Testing – passed

View job log


vt-build-amd64-ubuntu-22-04-clang-15-cpp

Build for 65ca237 (2025-10-01 19:16:25 UTC)

Compilation – successful

Testing – passed

View job log


vt-build-amd64-ubuntu-20-04-gcc-9-cuda-11-4-3-cpp

Build for 65ca237 (2025-10-01 19:16:25 UTC)

/vt/src/vt/runtime/runtime.cc(1117): warning: parameter "comm" was declared but never referenced

/vt/src/vt/runtime/component/component_pack.impl.h(55): warning: parameter "seq" was declared but never referenced
          detected during:
            instantiation of "std::unique_ptr<T, std::default_delete<T>> vt::runtime::component::<unnamed>::tupleCons<T,Tuple>(Tuple &&) [with T=vt::arguments::ArgConfig, Tuple=std::tuple<std::unique_ptr<vt::arguments::ArgConfig, std::default_delete<vt::arguments::ArgConfig>>>]" 
(104): here
            instantiation of "vt::runtime::component::registry::AutoHandlerType vt::runtime::component::ComponentPack::registerComponent(T **, vt::runtime::component::BaseComponent::StartupDepsPack<StartupDeps...>, vt::runtime::component::BaseComponent::RuntimeDepsPack<RuntimeDeps...>, Cons &&...) [with T=vt::arguments::ArgConfig, StartupDeps=<>, RuntimeDeps=<>, Cons=<std::unique_ptr<vt::arguments::ArgConfig, std::default_delete<vt::arguments::ArgConfig>>>]" 
/vt/src/vt/runtime/runtime.cc(770): here

/vt/src/vt/runtime/component/component_pack.impl.h(55): warning: parameter "tup" was declared but never referenced
          detected during:
            instantiation of "std::unique_ptr<T, std::default_delete<T>> vt::runtime::component::<unnamed>::tupleConsImpl<T,Tuple,I...>(Tuple &&, std::index_sequence<I...>) [with T=vt::util::memory::MemoryUsage, Tuple=std::tuple<>, I=<>]" 
(64): here
            instantiation of "std::unique_ptr<T, std::default_delete<T>> vt::runtime::component::<unnamed>::tupleCons<T,Tuple>(Tuple &&) [with T=vt::util::memory::MemoryUsage, Tuple=std::tuple<>]" 
(104): here
            instantiation of "vt::runtime::component::registry::AutoHandlerType vt::runtime::component::ComponentPack::registerComponent(T **, vt::runtime::component::BaseComponent::StartupDepsPack<StartupDeps...>, vt::runtime::component::BaseComponent::RuntimeDepsPack<RuntimeDeps...>, Cons &&...) [with T=vt::util::memory::MemoryUsage, StartupDeps=<vt::ctx

 ==> And there is more. Read log. <==

View job log


vt-build-amd64-ubuntu-24-04-clang-18-cpp

Build for 65ca237 (2025-10-01 19:16:25 UTC)

Compilation – successful

Testing – passed

View job log


vt-build-amd64-ubuntu-20-04-clang-10-cpp

Build for 65ca237 (2025-10-01 19:16:25 UTC)

Compilation – successful

Testing – passed

View job log


vt-build-amd64-ubuntu-24-04-clang-17-cpp

Build for 65ca237 (2025-10-01 19:16:25 UTC)

Compilation – successful

Testing – passed

View job log


vt-build-amd64-ubuntu-24-04-clang-16-vtk-cpp

Build for 65ca237 (2025-10-01 19:16:25 UTC)

Compilation – successful

Testing – passed

View job log


vt-build-amd64-ubuntu-20-04-gcc-9-cuda-12-2-0-cpp

Build for 65ca237 (2025-10-01 19:16:25 UTC)

/vt/lib/fmt/include/fmt-vt/base.h(472): warning #128-D: loop is not reachable
    for (; n != 0; ++s1, ++s2, --n) {
    ^
          detected during:
            instantiation of "auto fmt::v11::detail::compare(const Char *, const Char *, std::size_t)->int [with Char=char]" at line 591
            instantiation of "auto fmt::v11::basic_string_view<Char>::compare(fmt::v11::basic_string_view<Char>) const->int [with Char=char]" at line 598
            instantiation of class "fmt::v11::basic_string_view<Char> [with Char=char]" at line 2621
            instantiation of "auto fmt::v11::basic_format_args<Context>::get_id(fmt::v11::basic_string_view<Char>) const->int [with Context=fmt::v11::context, Char=char]" at line 2664

Remark: The warnings can be suppressed with "-diag-suppress <warning-number>"

/vt/lib/fmt/include/fmt-vt/format.h(2596): warning #2417-D: constexpr constructor calls non-constexpr function "fmt::v11::basic_memory_buffer<T, SIZE, Allocator>::basic_memory_buffer(const Allocator &) [with T=fmt::v11::detail::bigint::bigit, SIZE=32UL, Allocator=fmt::v11::detail::allocator<fmt::v11::detail::bigint::bigit>]"
    constexpr bigint() : exp_(0) {}
                                 ^

/vt/lib/fmt/include/fmt-vt/base.h(472): warning #128-D: loop is not reachable
    for (; n != 0; ++s1, ++s2, --n) {
    ^
          detected during:
            instantiation of "auto fmt::v11::detail::compare(const Char *, const Char *, std::size_t)->int [with Char=char]" at line 591
            instantiation of "auto fmt::v11::basic_string_view<Char>::compare(fmt::v11::basic_string_view<Char>) const->int [with Char=char]" at line 598
            instantiation of class "fmt::v11::basic_string_view<Char> [with Char=char]" at line 2621
            instantiation of "auto fmt::v11::basic_format_args<Context>::get_id(fmt::v11::basic_string_view<Char>) const->int [with Context=fmt::v11::context, Char=char]" at line 2664

Remark: The warnings can be suppressed with "-diag-suppress <warning-n

 ==> And there is more. Read log. <==

View job log


vt-build-amd64-ubuntu-22-04-clang-14-cpp

Build for 65ca237 (2025-10-01 19:16:25 UTC)

Compilation – successful

Testing – passed

View job log


vt-build-amd64-ubuntu-20-04-gcc-9-cpp-docs

Build for d443fbe (2025-10-01 19:11:03 UTC)

Compilation – successful

Testing – passed

View job log


@codecov
Copy link

codecov bot commented Aug 29, 2025

Codecov Report

❌ Patch coverage is 90.25641% with 38 lines in your changes missing coverage. Please review.
✅ Project coverage is 88.55%. Comparing base (87107a1) to head (183c309).
⚠️ Report is 29 commits behind head on develop.

Files with missing lines Patch % Lines
...vt/vrt/collection/balance/temperedlb/temperedlb.cc 84.81% 36 Missing ⚠️
tests/unit/lb/test_temperedlb_work_calc.cc 98.38% 2 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #2436      +/-   ##
===========================================
+ Coverage    88.35%   88.55%   +0.20%     
===========================================
  Files          727      728       +1     
  Lines        31256    31523     +267     
===========================================
+ Hits         27616    27916     +300     
+ Misses        3640     3607      -33     
Files with missing lines Coverage Δ
.../vrt/collection/balance/temperedlb/tempered_msgs.h 98.46% <100.00%> (+0.42%) ⬆️
.../vt/vrt/collection/balance/temperedlb/temperedlb.h 100.00% <100.00%> (ø)
tests/unit/lb/test_temperedlb.cc 100.00% <ø> (ø)
tests/unit/lb/test_temperedlb_work_calc.cc 98.38% <98.38%> (ø)
...vt/vrt/collection/balance/temperedlb/temperedlb.cc 82.60% <84.81%> (+3.90%) ⬆️

... and 8 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@lifflander lifflander force-pushed the 2435-temperedlb-improvementsoptimizations branch from 23865f3 to 2976892 Compare October 1, 2025 17:47
@nlslatt nlslatt dismissed cz4rs’s stale review October 1, 2025 20:25

Most comments were addressed. Code duplication will be addressed as part of a separate PR that will involve substantial refactoring.

@lifflander lifflander merged commit 0a685ed into develop Oct 1, 2025
35 of 37 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

TemperedLB improvements/optimizations

4 participants